Exploring the Price of a Diamond based on Features
Scatterplot - Carat Vs Price

GGpairs - Feature Correlation

The Demand of Diamonds

Scatter Plot - Carat Vs Price


Price Vs. Carat & Clarity
##
## 0.3 0.31 1.01 0.7 0.32 1
## 2604 2249 2242 1981 1840 1558
##
## 605 802 625 828 776 698
## 132 127 126 125 124 121

Price vs Carat and Cut

Price vs Carat and Color

Building the Linear Model for Price
##
## Calls:
## m1: lm(formula = I(log(price)) ~ I(carat^(1/3)), data = diamonds)
## m2: lm(formula = I(log(price)) ~ I(carat^(1/3)) + carat, data = diamonds)
## m3: lm(formula = I(log(price)) ~ I(carat^(1/3)) + carat + cut, data = diamonds)
## m4: lm(formula = I(log(price)) ~ I(carat^(1/3)) + carat + cut + color,
## data = diamonds)
## m5: lm(formula = I(log(price)) ~ I(carat^(1/3)) + carat + cut + color +
## clarity, data = diamonds)
##
## =============================================================================
## m1 m2 m3 m4 m5
## -----------------------------------------------------------------------------
## (Intercept) 2.821*** 1.039*** 0.874*** 0.932*** 0.415***
## (0.006) (0.019) (0.019) (0.017) (0.010)
## I(carat^(1/3)) 5.558*** 8.568*** 8.703*** 8.438*** 9.144***
## (0.007) (0.032) (0.031) (0.028) (0.016)
## carat -1.137*** -1.163*** -0.992*** -1.093***
## (0.012) (0.011) (0.010) (0.006)
## cut: .L 0.224*** 0.224*** 0.120***
## (0.004) (0.004) (0.002)
## cut: .Q -0.062*** -0.062*** -0.031***
## (0.004) (0.003) (0.002)
## cut: .C 0.051*** 0.052*** 0.014***
## (0.003) (0.003) (0.002)
## cut: ^4 0.018*** 0.018*** -0.002
## (0.003) (0.002) (0.001)
## color: .L -0.373*** -0.441***
## (0.003) (0.002)
## color: .Q -0.129*** -0.093***
## (0.003) (0.002)
## color: .C 0.001 -0.013***
## (0.003) (0.002)
## color: ^4 0.029*** 0.012***
## (0.003) (0.002)
## color: ^5 -0.016*** -0.003*
## (0.003) (0.001)
## color: ^6 -0.023*** 0.001
## (0.002) (0.001)
## clarity: .L 0.907***
## (0.003)
## clarity: .Q -0.240***
## (0.003)
## clarity: .C 0.131***
## (0.003)
## clarity: ^4 -0.063***
## (0.002)
## clarity: ^5 0.026***
## (0.002)
## clarity: ^6 -0.002
## (0.002)
## clarity: ^7 0.032***
## (0.001)
## -----------------------------------------------------------------------------
## R-squared 0.924 0.935 0.939 0.951 0.984
## adj. R-squared 0.924 0.935 0.939 0.951 0.984
## sigma 0.280 0.259 0.250 0.224 0.129
## F 652012.063 387489.366 138654.523 87959.467 173791.084
## p 0.000 0.000 0.000 0.000 0.000
## Log-likelihood -7962.499 -3631.319 -1837.416 4235.240 34091.272
## Deviance 4242.831 3613.360 3380.837 2699.212 892.214
## AIC 15930.999 7270.637 3690.832 -8442.481 -68140.544
## BIC 15957.685 7306.220 3761.997 -8317.942 -67953.736
## N 53940 53940 53940 53940 53940
## =============================================================================
## [1] "X" "carat" "cut" "color"
## [5] "clarity" "table" "depth" "cert"
## [9] "measurements" "price" "x" "y"
## [13] "z"
##
## Calls:
## m1: lm(formula = I(logprice ~ I(carat^(1/3))), data = diamonds_big[diamonds_big$price <
## 10000 & diamonds_big$cert == "GIA", ])
## m2: lm(formula = logprice ~ I(carat^(1/3)) + carat, data = diamonds_big[diamonds_big$price <
## 10000 & diamonds_big$cert == "GIA", ])
## m3: lm(formula = logprice ~ I(carat^(1/3)) + carat + cut, data = diamonds_big[diamonds_big$price <
## 10000 & diamonds_big$cert == "GIA", ])
## m4: lm(formula = logprice ~ I(carat^(1/3)) + carat + cut + color,
## data = diamonds_big[diamonds_big$price < 10000 & diamonds_big$cert ==
## "GIA", ])
## m5: lm(formula = logprice ~ I(carat^(1/3)) + carat + cut + color +
## clarity, data = diamonds_big[diamonds_big$price < 10000 &
## diamonds_big$cert == "GIA", ])
##
## =================================================================================
## m1 m2 m3 m4 m5
## ---------------------------------------------------------------------------------
## (Intercept) 2.671*** 1.333*** 0.949*** 1.341*** 0.665***
## (0.003) (0.012) (0.012) (0.010) (0.007)
## I(carat^(1/3)) 5.839*** 8.243*** 8.633*** 8.110*** 8.320***
## (0.004) (0.022) (0.021) (0.017) (0.012)
## carat -1.061*** -1.223*** -0.782*** -0.763***
## (0.009) (0.009) (0.007) (0.005)
## cut: Ideal 0.211*** 0.181*** 0.131***
## (0.002) (0.001) (0.001)
## cut: V.Good 0.120*** 0.090*** 0.071***
## (0.002) (0.001) (0.001)
## color: E/D -0.083*** -0.071***
## (0.001) (0.001)
## color: F/D -0.125*** -0.105***
## (0.001) (0.001)
## color: G/D -0.178*** -0.162***
## (0.001) (0.001)
## color: H/D -0.243*** -0.225***
## (0.002) (0.001)
## color: I/D -0.361*** -0.358***
## (0.002) (0.001)
## color: J/D -0.500*** -0.509***
## (0.002) (0.001)
## color: K/D -0.689*** -0.710***
## (0.002) (0.002)
## color: L/D -0.812*** -0.827***
## (0.003) (0.002)
## clarity: I2 -0.301***
## (0.006)
## clarity: IF 0.751***
## (0.002)
## clarity: SI1 0.426***
## (0.002)
## clarity: SI2 0.306***
## (0.002)
## clarity: VS1 0.590***
## (0.002)
## clarity: VS2 0.534***
## (0.002)
## clarity: VVS1 0.693***
## (0.002)
## clarity: VVS2 0.633***
## (0.002)
## ---------------------------------------------------------------------------------
## R-squared 0.888 0.892 0.899 0.937 0.969
## adj. R-squared 0.888 0.892 0.899 0.937 0.969
## sigma 0.289 0.284 0.275 0.216 0.154
## F 2700903.714 1406538.330 754405.425 423311.488 521161.443
## p 0.000 0.000 0.000 0.000 0.000
## Log-likelihood -60137.791 -53996.269 -43339.818 37830.414 154124.270
## Deviance 28298.689 27291.534 25628.285 15874.910 7992.720
## AIC 120281.582 108000.539 86691.636 -75632.827 -308204.540
## BIC 120313.783 108043.473 86756.037 -75482.557 -307968.400
## N 338946 338946 338946 338946 338946
## =================================================================================
Predictions
## fit lwr upr
## 1 5040.436 3730.34 6810.638